Reducing Latency, Power, and Gate Count with the Tensilica Floating-Point FMA

نویسنده

David Chen

چکیده

Today’s digital signal processing applications such as radar, echo cancellation, and image processing are demanding more dynamic range and computation accuracy. Floating-point arithmetic units offer better precision, higher dynamic range, and shorter development cycles when compared to fixed-point arithmetic units. Minimizing the design’s time to market is more important than ever. Algorithm developers use MATLAB to develop and test their ideas, which are mostly floating-point arithmetic based. However, digital signal processor (DSP) programmers port the algorithms into fixed-precision arithmetic units since floating-point arithmetic units are considerably larger, slower, and more power-hungry than the fixed-point arithmetic units. This is not a trivial effort as the programmers must verify the results, including the error rate (accuracy) of fixed-point and floating-point algorithms. Furthermore, usually fixed-point software codes require more cycles than floating-point versions on many algorithms. For example, using the Cadence® Tensilica BBE32EP DSP, a 4x4 matrix Cholesky decomposition in fixed-point takes 18 cycles, while in floating-point it takes 15 cycles. As this example illustrates, it makes sense to keep using floating-point computation units when greater dynamic range and accuracy are required for an application. To overcome some of the drawbacks of floating-point arithmetic units, Cadence has developed an innovative patent-granted design.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design of Gate-Driven Quasi Floating Bulk OTA-Based Gm–C Filter for PLL Applications

The advancement in the integrated circuit design has developed the demand for low voltage portable analog devices in the market. This demand has increased the requirement of the low-power RF transceiver. A low-power phase lock loop (PLL) is always desirable to fulfill the need for a low power RF transceiver. This paper deals with the designing of the low power transconductance- capacitance (Gm-...

متن کامل

Mixed-precision Fused Multiply and Add

The standard floating-point fused multiply and add (FMA) computes R=AB+C with a single rounding. This article investigates a variant of this operator where the addend C and the result R are of a larger format, for instance binary64 (double precision), while the multiplier inputs A and B are of a smaller format, for instance binary32 (single precision). With minor modifications, this operator is...

متن کامل

Analyzing Two-Term Dot Product of Multiplier Using Floating Point and Booth Multiplier

The Floating Point in two-term Dot-Product of multiplier referred as discrete design. Floating Point is a wide variety for increasing accuracy, high speed, high performance and reducing delay, area and power consumption. This application of floating point is used for algorithms of Digital Signal Processing and Graphics. Many floating point application is to reduce area, from the survey the fuse...

متن کامل

Error bounds on complex floating-point multiplication with an FMA

The accuracy analysis of complex floating-point multiplication done by Brent, Percival, and Zimmermann [Math. Comp., 76:1469–1481, 2007] is extended to the case where a fused multiply-add (FMA) operation is available. Considering floating-point arithmetic with rounding to nearest and unit roundoff u, we show that their bound √ 5u on the normwise relative error |ẑ/z − 1| of a complex product z c...

متن کامل

A Survey on Floating Point Adders

Addition is the most complex operation in a floating-point unit and can cause major delay while requiring a significant area. Over the years, the VLSI community has developed many floating-point adder algorithms aimed primarily at reducing the overall latency. An efficient design of the floating-point adder offers major area and performance improvements for FPGAs. This paper studies the impleme...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Reducing Latency, Power, and Gate Count with the Tensilica Floating-Point FMA

نویسنده

چکیده

منابع مشابه

Design of Gate-Driven Quasi Floating Bulk OTA-Based Gm–C Filter for PLL Applications

Mixed-precision Fused Multiply and Add

Analyzing Two-Term Dot Product of Multiplier Using Floating Point and Booth Multiplier

Error bounds on complex floating-point multiplication with an FMA

A Survey on Floating Point Adders

عنوان ژورنال:

اشتراک گذاری